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Abstract — A  reasonable  starting  place  for  developing  deci¬ 
sion  fusion  rules  of  families  of  classification  systems  is  using 
the  logical  AND  and  OR  rules.  These  two  rules,  along  with 
the  unary  rule  NOT,  can  lead  to  a  Boolean  algebra  when  a 
number  of  properties  are  shown  to  exist.  This  paper  examines 
how  these  rules  for  classification  system  families  comprise  a 
Boolean  algebra  of  systems.  This  Boolean  algebra  of  families  is 
then  shown  under  assumptions  of  independence  to  be  isomorphic 
to  a  Boolean  Algebra  of  Receiver  Operating  Characteristic  (ROC) 
curves.  These  decision  fusion  rules  produce  ROC  curves  which 
become  the  bounds  by  which  to  test  non-boolean,  possibly  non¬ 
decision  fusion  rules  for  performance  increases.  We  give  an 
example  to  demonstrate  the  usefulness  of  this  Boolean  Algebra 
of  ROC  curves. 

Keywords:  fusion  rules,  boolean  algebra,  receiver  operat¬ 
ing  characteristic  (ROC)  curves,  fusor,  information  fusion, 
optimization. 

1.  Introduction  and  Problem  Statement 

Given  a  finite  number  of  families  of  classification  systems 
(with  2-label  output),  how  do  we  find  the  best  possible 
decision  fusion  rule  (also  called  label  fusion  rule)?  In  the 
realm  of  deterministic  rules.  Boolean  rules  comprise  the 
“whole  show”(almost)  for  decision  fusion.  It  has  been  shown 
that  under  the  assumption  of  independence  of  classification 
systems,  the  Boolean  AND  operation  on  families  of  classifi¬ 
cation  systems  induces  another  Boolean  AND  operation  on 
the  receiver  operating  characteristic  (ROC)  curves  of  each 
corresponding  family  [1].  In  this  paper  we  show  this  extends 
to  the  Boolean  OR  operation  and  the  unary  NOT  operation. 
With  these  three  operations  defined,  we  develop  a  Boolean 
Algebra  of  ROC  curves  which  corresponds  to  the  Boolean 
algebra  of  a  finite  number  of  families  of  classification  systems. 
This  Boolean  Algebra  of  families  of  classification  systems  is 
the  easiest  fused  systems  a  fusion  engineer  can  design,  build, 
test,  and  evaluate.  With  this  Boolean  Algebra  of  ROC  curves, 
one  does  not  have  to  physically  build  or  test  the  systems  in 
order  to  determine  its  performance,  and  thus,  determine  which 
design  is  optimal. 

We  develop  this  paper  first  by  defining  families  of  clas¬ 
sification  systems  in  Section  II,  and  review  how  information 
fusion  occurs  along  the  nodes  of  these  families.  Section  III  is 


devoted  to  the  discussion  on  the  fusion  rules  used.  Section  IV 
briefiy  reviews  Boolean  Algebras  that  is  used  in  Section  V,  the 
important  results,  when  we  prove  that  a  Boolean  algebra  of 
families  of  classification  systems  is  isomorphic  to  a  Boolean 
algebra  of  ROC  curves.  We  give  an  example  in  Section  VI, 
and  conclude  with  Section  VII. 

Several  authors  have  considered  the  Boolean  Algebra  of 
systems  generated  by  ANDing  and  ORing  the  original  systems, 
see  [2],  [3],  and  [4],  to  name  a  few. 

II.  Families  oe  Classieication  Systems 

The  classification  system  can  be  defined  mathematically, 
which  allows  the  fused  system  to  be  written  in  terms  of  the 
individual  systems.  Let  f  be  a  population  set  of  outcomes.  Let 
(3  be  a  cr-algebra  of  subsets  of  £,  then  (f ,  (8)  is  a  measurable 
space  [5].  Let  P  be  a  probability  measure  defined  on  then 
(f ,  (8,  P)  is  a  probability  measure  space.  Let  s  be  a  sensor  that 
produces  data  as  its  output,  i.e.,  s  is  a  mapping  of  outcomes 
from  the  population  set  f  to  a  datum.  Let  V  denote  the  data  set. 
Then  we  write  s  :  f  ^  P  or  its  diagram  £  V.  Examples 
of  datum  from  this  data  set  may  take  on  many  forms  such  as 
infrared  imagery,  radar  signals,  data  streams,  or  video.  This 
data  may  be  too  difficult  to  classify  using  its  current  form,  so 
a  mapping  p  defined  on  V  is  used  to  produce  an  element  x, 
called  a  feature.  Typically,  this  element  x  is  a  vector  of  real 
numbers,  though  it  need  not  be.  Let  the  mapping  p  represent 
a  processor  that  takes  a  datum  from  V  and  produces  a  feature, 
i.e.,  V  ^T.  Since  X  might  be  a  vector  of  real  numbers,  then 
T  C  for  some  positive  integer  N .  Let  0  be  a  threshold 
set  (or  a  set  of  parameters);  maybe,  0  =  [0, 1]  or  0  =  M  = 
(—00,00).  For  each  6>  G  0  let  be  a  classifier  mapping  T 
into  a  label  set  £.  That  is,  a6>  :  P  ^  £  or  P  C  for  each 
6>  G  0.  For  a  two-class  problem,  examples  of  a  label  set  could 
be  £  =  {true,  false},  £  =  {T,F},£  =  {0,1}  or  even  £  = 
{target,non-target}.  In  this  paper,  we  use  £  =  {t,n}  where  t 
=  “targef’and  n  =  “non-target”.  The  graphical  representation 
of  these  mappings  is  given  by  the  following  diagram. 

Define  the  system  Ag  to  be  the  composition  of  these 
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diagrams  are 


mappings  for  each  6>  G  0.  That  is,  for  each  6>  G  0,  = 

a6>  o  p  o  s.  Graphically,  the  diagram  for  the  system  is  written 
as 


for  each  0  e  Q. 

A.  Two  Classification  Systems 

There  are  many  ways  in  which  to  express  two  (or  more) 
classification  systems.  In  this  paper,  however,  the  multiple 
classification  system  must  be  developed  using  two  main 
premises.  First,  the  systems  to  be  combined  are  fused  together 
using  label  fusion,  that  is,  once  each  system  has  produced 
a  label  for  a  specific  outcome  from  the  event  set,  these 
labels  are  combined  together  to  generate  one  overall  label 
for  that  outcome.  The  creation  of  this  overall  label  from  the 
underlying  classification  systems  defines  how  the  systems  are 
fused  together,  via  the  labels.  Second,  the  label  set  for  all 
systems  considered,  including  each  individual  system  and  the 
fused  classification  system,  contains  two  values  or  two  classes. 
Examples  of  possible  members  of  this  label  set  were  given 
previously,  but  the  label  set  considered  here  is  £  =  {t,n} 
where  t  =  “target” and  n  =  “non-target”.  Using  the  premises 
of  label  fusion  and  a  two-class  label  system,  representations 
for  a  two  classification  system  are  developed. 

Consider  the  case  when  two  sensors,  Si  and  S2,  observe 
outcomes  occurring  in  the  same  population  set  £.  Assume  they 
produce  datum  in  data  sets  Vi  and  V2,  respectfully.  That  is. 
Si  \  ^  Vi  and  S2  :  f  >  2^2-  Further,  assume  sensors 

Si  and  S2  each  have  a  processor,  pi  and  p2,  respectively, 
which  maps  datum  in  the  respective  data  sets,  Vi  and  V2,  to 
features  in  the  feature  sets  Ti  and  ^2-  In  particular,  assume 
Pi  :  Pi  ^  JTi  and  p2  :  P2  — >  *£2-  Suppose  that  the  family 
of  classifiers  for  pi  and  Si  is  given  by  {a^  :  6>  G  0}  and 
that  the  family  of  classifiers  for  p2  and  S2  is  given  by  another 
family,  {b^  :  0  G  Let  slq  \  Ti  ^  £1  for  each  G  0 
and  :  P2  ^  £2  for  each  0  G  Then  the  labels  that 
are  produced  from  each  of  the  classification  systems  are  fused 
together  to  create  an  overall  label  for  the  outcome  of  interest. 
The  composition  of  these  mappings  yield  systems  represented 
by  the  following  diagram. 


For  these  two  classification  systems  the  compositions 
yield  the  systems  A^  =  a6>  o  pi  o  si  for  each  6>  G  0  and 
=  b^  o  p2  o  S2  for  each  0  G  T>.  Thus,  the  individual 


Be/, 

£ - ^£2 

and  the  two  families  of  classification  systems  will  be  denoted 
by  A  =  {A^  :  6>  G  0}  and  ®  G  <!>}. 

The  two  classification  systems  developed  above  map 
outcomes  from  the  population  set  into  different  data,  feature, 
and  label  sets,  which  are  then  used  to  fuse  the  classification 
systems  together.  There  are,  however,  other  ways  to  label  out¬ 
comes  from  the  event  set.  In  this  paper,  classification  systems 
can  map  outcomes  into  either  the  same  or  different  data  sets  or 
the  same  or  different  feature  sets.  The  sets  which  must  remain 
the  same  for  the  mathematical  development  contained  herein 
are  the  event  set  £  and  the  two-class  label  set  £.  Therefore, 
the  classification  systems  must  be  acting  from  the  same  event 
set,  map  into  either  the  same  or  different  data  and  feature  sets 
and  eventually  map  into  the  same  label  set.  That  is. 


£ 


B.  ROC  Curves 

Each  mapping  in  the  classification  system,  as  well  as  the 
composition  of  mappings,  has  a  pre-image.  Let  f  be  a  function 
mapping  set  A  into  set  3^,  so  f  \  X  y.  Given  a  subset 
Y  C  y  we  define  the  pre-image  of  f  to  be  the  subset  in  A  by 

f\Y)  =  {x€X  :  f{x)  e  Y}. 

The  pre-image  is  sometimes  called  the  inverse  image,  although 
the  mapping  f  need  not  be  invertible,  yet  the  superscript  —1 
is  used.  Because  this  construction  creates  a  natural  mapping 
from  subsets  of  y  into  subsets  of  A,  the  natural  symbol  ^  will 
be  used  instead  of  —1.  Therefore,  we  write  f^(y)  =  X.  If 
we  consider  the  entire  classification  system  as  a  composition 
of  mappings,  then  we  can  write  the  pre-image  of  a  specific 
label  ^  G  £  produced  by  the  classification  system  A^  .  Let 
Cl  =  {1}  so  that  then  A^(£^)  =  {e  G  £  :  Ae{e)  G  Ci}.  The 
use  of  pre-images  allows  us  to  take  the  resulting  labels  and 
express  these  in  terms  of  the  underlying  probabilities.  This  is 
demonstrated  in  the  development  of  the  ROC  curve. 

Assume  the  label  set  is  £  =  {t,  n}  where  t  and  n  may  be 
real  values  or  symbols  and  the  label  t  represents  a  “targef’and 
the  label  n  represents  a  “non-target”.  Define  Ct  =  {t}  and 
£n  =  {n}.  We  assume  the  event  set  £  can  be  partitioned  into 
a  target  event  set  containing  all  target  outcomes  and  a  non¬ 
target  event  set  containing  non-target  outcomes.  Denote  the 
true  target  event  set  as  £t  and  the  true  non-target  event  set  as 
£n.  Thus,  £  =  £tU  £n  and  £t^£n  =  0- 


In  order  to  quantify  how  well  the  classification  system 
performs,  we  appeal  to  the  probability  measure  space 
{£,  (S,P)  to  compute  the  following  four  performance  quan¬ 
tifiers.  Let  Ptp{Aq)  denote  the  probability  of  true  positive 
classification  of  the  classification  system  A^.  Then  Ptp{Aq) 
is  the  probability  that  the  classification  system  A^  labels  an 
outcome,  e,  as  a  target  label,  t,  given  that  the  outcome  really  is 
a  target  outcome  from  the  target  event  set,  Sf.  Mathematically, 
Ptp{Ao)  is  defined  by  the  conditional  probability 

p(aI  {Ct)n£t) 

Ptp(A,)  =  P{A,(e)  =t\eG£t}=  ^  ^ 

Let  Pfp{Aq)  denote  the  probability  of  false  positive  classi¬ 
fication  of  the  system  A^.  Then  Pfp{Aq)  is  the  probability 
that  the  classification  system  A^  labels  an  event  outcome,  e, 
as  a  target  label,  t,  given  that  the  outcome  is  really  a  non-target 
from  the  non-target  set  of  the  event  set,  Sn-  Mathematically, 
Pfp{Aq)  is  defined  by  the  conditional  probability 

p(a^ 

Ppp(A,)  =  P{A,(e)  =t\e€£n}=  ^ 

Let  Ptn{Ao)  denote  the  probability  of  true  negative  classi¬ 
fication  of  the  system  A^.  Then  Ptn{Ao)  is  the  probability 
that  the  classification  system  A^  labels  an  event  outcome,  e, 
as  a  non-target  label,  n,  given  that  the  outcome  really  is  a 
non-target  outcome  from  the  non-target  event  set,  Mathe¬ 
matically,  Ptn{Ao)  is  defined  by  the  conditional  probability 

p(A^  {Cn)n£n) 

PTNiAe)  =  P{Ae{e)  =  n\ee£„}=  ^  ^  ^ 

Let  Pfn{Ao)  denote  the  probability  of  false  negative  classi¬ 
fication  by  the  system  A^.  Then  PFN{Ae)  is  the  probability 
that  the  classification  system  A^  labels  an  event  outcome,  e, 
as  a  non-target  label,  n,  given  that  the  outcome  is  really  a 
target  outcome  from  the  target  event  set,  £f  Mathematically, 
Pfn{Ao)  is  defined  by  the  conditional  probability 

p(a^ 

Ppiv(As)  =  P{A0ie)  =n\ee£t}=  ^ 

Note  that  each  of  these  four  probabilities  are  dependent  on 
the  threshold  value,  0.  A  single  value  for  each  of  these 
probabilities  is  computed  for  each  value  of  6>.  As  the  value  of  0 
changes,  so  do  the  values  of  Pfp{Ao),  Ptp{Ao),  Ptn{Ao) 
and  Pfn{Ao).  Define  0  as  a  set  of  possible  thresholds  and 
for  each  G  0,  and  the  set  of  triples 

=  {{0,  PppiAe),  PppiAe))  :  0  e  Q} 

to  be  the  trajectory  of  A.  We  can  project  this  trajectory  onto 
the  second  and  third  component  to  yield  the  set 

/a  =  {{PFp{Ae),PTp{Ae))  :  0  G  0}. 

If  0  is  homeomorphic  to  the  real  numbers  M,  then  the 
trajectory  ta  will  be  a  curve  in  and  the  projection  /a  will 


Figure  1.  A  ROC  trajectory  (solid)  and  its  projection  the  ROC  curve  (dashed). 

be  a  curve  in  (more  specific,  a  curve  in  the  unit  square 
[0, 1]  X  [0, 1]).  Formally,  this  curve  is  called  the  ROC  curve 
for  the  system  family  A.  An  example  of  this  projection  is 
given  in  Figure  1.  For  the  case  when  0  is  discrete,  the  ROC 
“curve”is  a  set  of  discrete  points. 

If  0  is  a  multi-dimensional  set  then  this  analysis  will 
not  yield  a  single  curve  in  the  Pfp-Ptp  plane.  Instead,  a 
collection  of  curves  is  created.  Therefore,  we  choose  the  upper 
frontier  to  be  the  ROC  curve  as  representative  of  the  classifier 
performance. 

Definition  1:  (ROC  function,  ROC  curve)  Let  A  =  { A^  : 
6>  G  0}  be  a  family  of  classification  systems  defined  on  the 
probability  space  (f ,  (3,  P)  mapping  to  the  label  set  C  =  {t,  n} 
with  parameter  set  0.  For  each  p  G  [0, 1]  ,  define  the  set 

—  {Ptp{Ao)  :  0  e  Q  and  Pfp{Ao)  <  p}. 

For  p  G  [0, 1],  if  &p  is  nonempty  then  define 

/a(p)  =  m3x{PTp{Ae)  :  6>  G  0  and  PFp{Ae)  <  p}.  (1) 

If  0p  is  empty  then  /a(p)  is  not  defined.  The  function  /a  is 
called  the  ROC  function.  The  graph  of  /a  is  called  the  ROC 
curve. 

In  practice,  the  set  0^  may  be  empty  for  certain  values 
of  p.  We  avoid  the  discussion  of  this  case  and  assume  that  the 
ROC  function  is  defined  for  all  p  G  [0, 1].  We  make  this  clear 
by  defining  a  total  ROC  function. 

Definition  2:  (Total  ROC  function.  Total  ROC  curve)  We 
say  a  ROC  curve  is  total  if  its  ROC  function  is  defined  for  all 
p  G  [0, 1],  that  is,  the  ROC  function  is  a  total  function. 

A  property  of  a  total  ROC  curve  are  given  in  the  following 
theorem. 

Theorem  1:  Let  A  =  {A^  :  0  G  0}bea  family  of 
classification  systems.  Then  /a  is  a  non-decreasing  function. 
That  is,  for  every  p,  g  G  [0,1]  with  p  ^  q  then  Pip)  <  fA{q)- 


Proof:  Let  p,  g  G  [0, 1]  with  p  <  q  then  &p  ^  &q  1)  AND  Rule:  The  AND  (conjunction)  rule  is  a  binary 

therefore,  operation  defined  on  C.  We  denote  the  AND  operation  by  the 

join  symbol  A.  Its  definition  is  given  in  the  table: 

fxip)  =  maxPTp(A0)  <  maxPTp(Ae)  =  fx{q). 


For  notational  purposes  we  denote  the  collection  of  total 
ROC  function  by 

Definition  3:  (Set  of  total  ROC  functions)  Let  the  set  of 
total  ROC  functions  be  denoted  by 


The  new  classification  system  AB^  is  defined  by  the  point- 
wise  AND  operation  on  its  output,  that  is, 

[A^  A  B^]  (e)  =  Ae{e)  A  ^4>{e)  for  all  ee£.  (2) 


^  =  {/  :  [0, 1]  ^  [0, 1]  I  /  is  non-decreasing  on  [0, 1]}. 

Notice  that  we  do  not  require  continuity  of  the  functions. 

We  write  f  =  g  to  mean  the  point- wise  equality,  that  is, 
f{p)  =  9{p)  for  all  p  G  [0, 1]. 

III.  Fusion  Rules 

There  are  two  types  of  fusion  for  classification  systems. 
The  first  type  allows  for  the  families  of  classification  systems 
which  are  to  be  fused  to  have  exactly  the  same  label  set.  We 
mean  exactly  the  same,  and  not  isomorphic,  so  that  if  the 
label  set  is,  in  fact,  C  =  {target,  non-target}  for  each  family, 
then  this  means  that  the  actual  definition  of  a  target  label  is 
identical  for  each.  This  allows  for  each  family  to  partition 
the  population  set  in  the  same  way.  This  type  of  information 
fusion  we  call  within-fusion  [1]. 

The  diagram  for  label  fusion  for  two  systems  is 


Although  many  label-fusion  rules  exist,  in  this  paper  we 
focus  on  the  Boolean  OR  and  AND  rules.  These  straightfor¬ 
ward,  “hard”  rules  will  be  used  to  develop  a  mathematical 
expression  for  the  ROC  curve  of  the  fused  classification  system 
using  only  properties  of  the  ROC  curves  from  the  individual 
systems.  In  this  manner,  if  we  know  the  performance  of  the 
individual  systems,  we  can  compute  the  performance  of  the 
fused  system  using  these  Boolean  label-fusion  rules  without 
any  replication  in  experimentation. 

Let  the  ROC  curve  associated  with  the  classification 
system  family  A  =  [Aq  :  0  e  &}  he  denoted  by  /a  and  the 
ROC  curve  associated  with  the  classification  system  family 
®  =  {B0:0G<I>}be  denoted  by  /b.  Recall  that  the  label  set 
C  =  {t,  n}. 


This  produces  a  new  classification  system  family  = 

{A^  A  B^  :  G  0,0  G  <!>}.  Thus,  to  be  labeled  as  “target”, 
both  the  label  from  systems  A^  and  B^  must  be  the  “target 
”  label.  For  brevity  we  write  A  A  ®  =  thus,  using  the 
AND  symbol  A  to  represent  the  binary  label  AND  operation 
(e.g.,  tAn),  the  binary  system  AND  operation  (e.g.,  A^  AB^), 
and  the  binary  family  AND  operation  (e.g.,  A  A  ®). 

2)  OR  Rule:  The  OR  (disjunction)  rule  is  also  a  binary 
operation  defined  on  C.  We  denote  the  OR  operation  by  the 
meet  symbol  V.  Its  definition  is  given  in  the  table: 


Then  the  new  classification  system  A^  V  B^  is  defined  by  the 
point-wise  OR  operation 

[A^  V  B^]  (e)  =  Ae{e)  V  B0(e)  for  all  e  G  f  (3) 

and  yields  a  new  classification  system  family  =  {A^  V 
B^  :  6>  G  0,0  G  <F}.  Thus,  to  be  labeled  as  “target”,  either 
the  label  from  system  A^  or  B^  must  be  the  “targef’label. 
For  brevity  we  write  A  V  ®  =  C®’'. 

In  comparison,  the  AND  rule  is  more  conservative  than 
the  OR  rule  in  labeling  of  an  object  as  target.  If  there  are 
negative  consequences  in  being  labeled  as  a  target,  then  a  more 
conservative  rule  may  be  warranted  in  order  to  avoid  excessive 
false  positives.  In  the  case  of  disease  detection,  however, 
the  OR  rule  may  be  warranted  in  preventative  screening,  for 
instance,  in  order  to  avoid  excessive  false  negatives  and  failure 
to  diagnose  a  disease  at  a  potentially  earlier  and  treatable  stage 
of  development. 

3)  NOT  Rule:  The  NOT  (negation  or  complementation) 
rule  is  a  unary  operation  defined  on  C.  We  denote  the  NOT 
operation  by  Its  definition  is  given  in  the  table: 


Then  the  new  classification  system  A^  is  defined  by  the  point- 
wise  NOT  operation 

[A6>](e)  =  Ae{e)  for  all  e  G  f  (4) 

and  yields  a  new  classification  system  family  A  =  {A^  :  6>  G 
0}.  Thus,  to  be  labeled  as  “target”,  the  label  from  system  A^ 


must  be  the  “non-target”.  Clearly,  the  NOT  rule  is  a  unary  rule 
and  not  a  fusion  rule, but  it  will  be  useful. 

IV.  Boolean  Algebras  of  a  finite  collection  of 
Families  of  Classification  Systems 

A.  Boolean  Algebras 

The  definition  of  a  Boolean  Algebra  is  given  below  [6]. 
Definition  4:  A  Boolean  Algebra  is  an  algebraic  struc¬ 
ture,  denoted  by  (^,=,A,V,^)  where 

sz/  is  a  nonempty  set  of  elements; 

=  denotes  element  equality; 

A  is  a  binary  operation  called  AND  or  conjunction; 

V  is  a  binary  operation  called  OR  or  disjunction; 

—7  is  a  unary  operation  called  NOT  or  negation  (or 
complementation) . 

And  the  following  axioms  hold  true: 

1)  ^  is  closed  w.r.t.  A,  V  and  For  every  a,  6  G  ^ 

a  /\h  £  a\l  h  £  ~a  £  . 

2)  is  associative  w.r.t.  A  and  V.  For  every  a^b^c  £  ^ 
{a  Ab)  A  c  =  a  A  {b  A  c)  {a  W  b)  W  c  =  a  W  {b  W  c) 

3)  ^  is  commutative  w.r.t.  A  and  V.  For  every  a^b  £ 

a  Ab  =  b  A  a  aV  b  =  bV  a  . 

4)  ^  has  unique  identities  w.r.t.  A  and  V.  There  exists 
unique  elements  l^u  £  such  that  for  every  a  £  sz/ 

a  Au  =  a  aW I  =  a  . 

5)  ^  is  absorptive  w.r.t.  A  and  V.  For  every  a^b  £ 

a  A  {aW  b)  =  a  aW  {a  A  b)  =  a  . 

6)  ^  is  distributive  w.r.t.  A  and  V.  For  every  a^b^c  £ 

a  A  {bW  c)  =  {a  Ab)  W  {a  Ac) 

aV  {b  Ac)  =  {a  V  b)  A  {a  V  c) 

1)  contain  complements  w.r.t.  A  and  V.  For  every  a  £ 

a  A~a  =  I  aV  ~a  =  u  . 

There  are  several  other  properties  that  follow  from  these 
axioms,  see  [6]  for  a  larger  list. 

B.  Boolean  Algebra  Generated  from  a  finite  number  of  Clas¬ 
sification  System  Families 

There  are  two  special  total  classification  system  families 
of  interest.  The  “target  ’’family  T  and  the  “non-target”  family 
N: 

T  =  {Ta  :  a  £  [0, 1]}  and  Ta{e)  =  t  for  all  e  G  f 

N  =  {Np  :  P  £  [0, 1]}  and  N/3(e)  =  n  for  all  e  G  f 

A  result  that  is  straightforward  is  the  following. 

Theorem  2:  Let  ^  denote  the  collection  of  total  clas¬ 
sification  system  families  defined  on  the  probability  space 


(f ,  P)  mapping  to  the  label  set  C  =  {t,  n}  such  that  each 
system  (in  each  family)  is  measurable  with  respect  to  (8.  Then 
=,  A,  V,  —v)  is  a  Boolean  Algebra. 

Suppose  one  has  K  families  of  total  classification  system 
families  that  are  distinct,  denoted  by 

More  specifically,  assume  each  A^^^  cannot  be  produced  from 
the  other  families  by  using  the  AND,  OR  and  NOT  operations. 
From  ^  one  can  generate  a  Boolean  algebra  of  total  classifi¬ 
cation  system  families,  denoted  by  ^(^),  taking  all  possible 
combinations  of  AND,  OR,  and  NOT.  This  Boolean  algebra  is 
call  a  Free  Boolean  Algebra  and  ^  is  called  the  generator  and 
acts  likes  an  “independent”  set  (in  the  same  fashion  as  linearly 
independent  sets  are  to  vector  spaces).  From  the  axioms  and 
identities,  the  cardinality  of  ^(^)  is  2^^  [7]. 

The  main  assumption  in  this  paper  is  that  the  classifica¬ 
tion  systems  are  independent,  that  is,  the  occurrence  or  non¬ 
occurrence  of  an  event  classified  by  one  system  will  not  affect 
the  occurrence  or  non-occurrence  of  another  event  classified 
by  the  other  system.  We  derive  expressions  for  the  probability 
of  true  and  false  positive  for  the  OR  and  AND  label-fusion 
rules  that  can  be  simplified  using  the  following  definition. 

Definition  5:  (Independent  Classification  Systems)  Let 
{E,(B,P)  be  a  probability  space.  Let  C  =  {t,n}  be  a  label 
set.  Let  A,B  :  f  £  be  two  classification  systems.  We 
say  that  the  systems  A,  B  are  system  independent  if  they 
are  statistically  independent  as  random  variables.  Thus,  the 
collection  of  pre-image  events  are  independent,  so  that, 

P(AH{f})  n  BHW))  =  P(AH{f}))P(BH{f})) 
for  all  £  £  {t,  n}. 

We  apply  this  notion  of  independence  to  pre-images  of 
the  classification  systems.  Recall  the  systems  A^  =  a^op^osi 
for  each  0  £  Q  and  B^  =  o  p2  o  S2  for  each  0  G  4>.  These 
compositions  take  sets  of  outcomes  from  the  event  set  and 
map  them  to  sets  of  labels  in  the  label  set.  The  pre-images  of 
these  non-target  label  sets  (A^(£n)  and  B^(£^))  trace  the 
mappings  back  to  corresponding  sets  in  the  sample  space. 
Thus,  if  classification  systems  A^  and  B^  are  independent, 
their  pre-images  will  be  independent,  as  example, 

P(A^({n})  n  B^({n}))  =  P(A^(W))  P(B^({n})). 

V.  Boolean  Algebra  OF  ROC  CURVES 
A.  AND  Label-Fusion  ROC  Formula 

Consider  the  development  of  the  probabilities  of  true  and 
false  positive  {Ptp{C^^)  and  PppiC^^),  respectively)  for 
the  AND  label-fusion  rule  under  the  assumption  of  indepen¬ 
dence. 

Theorem  3:  (AND  Label-Fusion  ROC  Formula)  Let 
{E^(B,P)  be  a  probability  space  and  C  =  {t,n}  be  a  label 
set.  Let  A  =  {A^  :  G  0}  and  ®  =  {B^  :  0  G  4>}  be 
measurable  and  independent  families  of  classification  systems 


with  admissible  parameter  sets,  designed  to  classify  the  same 
target  outcomes  in  £.  Let  /a  and  /b  denote  their  corresponding 
ROC  curves.  Let  A  A®  be  the  resulting  family  of  classification 
systems.  Then  the  ROC  curve  /aab  is  given  by 

/aabW  =  Jk{p)h{q)  (5) 

pq=r 

for  every  r  G  [0, 1].  Furthermore, 


Proof  of  this  formula  can  be  found  in  [1],  [8]  and  [9]  This 
formula  motivates  the  definition  of  the  transformation  7^ 
associated  with  the  AND  operation  A  and  acts  on  the  ROC 
curves.  Specifically,  given  two  ROC  curves  /,  ^  G  and  for 
each  r  G  [0, 1]  we  define 


[^a(/,5)]  (r) 


,  f{p)g{q)  for  f 

P,qe[0,l] 

pq=r 

f{r)  for  f  =g 


(6) 


This  transformation  can  be  shown  to  have  the  following 
properties: 

1)  (closure)  7)\  is  defined  on  all  of  ^  x  i.e.  V{T/\)  = 

2)  (associative)  g),h)  for  all 

f,g,he 

3)  (symmetric)  T^{f,g)  =  T^{g,  f)  for  all  f,g 

4)  (identity)  T/\{f,  1)  =  /  for  all  /  G 

5)  (idempotent)  T^(f,  /)  =  /  for  all  /  G 

6)  (minimum  element)  0)  =  0  for  all  /  G 

This  shows  that  7^  is  a  binary  operation,  and  motivates 
the  creation  of  a  new  symbol  that  represents  it.  Given  /,  ^  G  ^ 
we  will  write 

fng  =  T/,{f,g). 


We  read  /  n  ^  as  ”/  and  g'\  We  use  a  different  symbol  since 
A  is  the  binary  operation  dealing  with  classification  systems 
and  n  deals  with  ROC  functions. 


B.  OR  Label-Fusion  ROC  Formula 


There  is  a  second  transformation,  7^,  associated  with  the 
OR  operation  V  that  acts  on  ROC  functions.  Specifically,  given 
two  ROC  curves  /,  ^  G  and  for  each  r  G  [0,1]  we  define 

f  [f{p)  +  g{q)  -  f{p)g{q)] ,  f  g 

p,qe[0,l] 

['rAf,g)]ir)  =  l 

[  f{r),  f  =  g 

We  list  some  properties  of  the  transformation  Ty. 

1)  (closure)  Ty  is  defined  on  all  of  x  that  is, 

V{ry)=^X^. 

2)  (associative)  T^if.T^ig.h))  =  g),h)  for  all 

f,g,he 

3)  (symmetric)  Ty{f,g)  =  Ty^g,  f)  for  all  f,g 

4)  (identity)  7^(/,  0)  =  /  for  all  /  G 

5)  (idempotent)  Ty(f,  /)  =  /  for  all  /  G 

6)  (maximal  element)  Ty{f,  1)  =  1  for  all  /  G 

We  now  have  that  7(/  is  a  binary  operation,  and  motivates 
the  creation  of  a  new  symbol  that  represents  this  binary 
operation.  Given  /,  ^  G  ^  we  will  write 


fUg  =  %/(f,g). 


We  read  f  U  g  as  ”/  or  g'\  We  use  the  symbol  U  rather  than 
V  in  order  to  distinguish  it  from  dealing  with  classification 
systems. 


C.  NOT  ROC  Formula 

Given  the  family  A  =  {A^  :  G  0}  with  ROC  curve  /a 

what  is  the  ROC  curve  for  A?  Since,  we  have  from  Equation  4 
that 

A  =  {A^  :  0  G  0} 


and 


(e) 


Ao{e)  for  every  e  G 


and,  since  £  =A^(£t)  U  we  have  by  the  disjunctive 

properties: 


Theorem  4:  (OR  Label-Fusion  ROC  Formula)  Let 
(f ,  P)  be  a  probability  space  and  C  =  {t,n}  be  a  label 
set.  Let  A  =  {A^  :  G  0}  and  ®  =  0  G  <F}  be 

two  measurable,  independent  families  of  classification  systems 
with  admissible  parameter  sets,  designed  to  classify  the  same 
target  outcomes  in  £.  Let  /a  and  /b  denote  their  corresponding 
ROC  curves.  Let  AV®  be  the  resulting  family  of  classification 
systems.  Then  the  ROC  curve  /avb  is  given  by 

/AvB(f’)  =  max  [/a(p)  +  h{q)  -  /a(p)/b(9)]  (7) 

p,9e[o,i] 

p+q—pq=r 

for  every  r  G  [0, 1].  Furthermore, 


Proof  of  this  formula  can  be  found  in  [1]. 


P[Ag{Ct)r\£t 


Ptp{A.0)  = 


P{£t) 

p(^Al{C^)n£t) 


P{£t) 

PFN{Ae)  =  1  -  PTp{Ae) 


Pfp{Ao)  = 


p(ll{Ct)  n£r 

P{£n) 

p(A^(£„)n£:„) 


P{£n) 

=  Ptn{Ao)  =  1  —  Pfp{Ao) 


Given  p  e  [0, 1]  then 


|-Prp(A0)  :  Ppp{Ae)  <  p  | 

{1  —  PTp{^e)  •  1  —  Pppi^e)  ^  p} 
1  —  min  {Ptp{A0)  :  Pfp{Aq)  >1  —  p} 
1  —  1^^  {Ptp{Aq)  :  Pfp{Aq)  <1  —  p} 


Observe  that  will  be  nondecreasing,  hence,  will  satisfy 
the  condition  to  be  in 

Theorem  5:  (NOT  Label-Fusion  ROC  Formula)  Let 
(f ,  (3,  P)  be  a  probability  space  and  C  =  {t,  n}  be  a  label 
set.  Let  A  =  {A0:6>G0}bea  total  family  of  classification 
systems.  Let  /a  denote  its  corresponding  ROC  curve.  Let 
A={A0  :  0  G  0}  be  the  resulting  family  of  classification 
systems  by  the  NOT  operation.  Then  the  ROC  function  is 
given  by 

/^(p)  =  1-/a(1-p)  (8) 

for  every  p  G  [0, 1]. 

This  motivates  the  operator  M  that  acts  on  ROC  curves 
/  G  ^  defined  by 

Wif)]  (p)  =  1  -  /(I  -  p) 

for  every  p  G  [0, 1]. 

The  operator  JV  satisfies  the  following  properties: 

1)  (closure)  AT  is  defined  on  all  of  that  is,  V{N')  = 

2)  (involution)  Af  =  /  for  all  /  G 

3)  (identity)  A/’(0)  =  =  0. 

We  now  have  that  Af  is  an  unary  operation  that  acts  like 
a  negation,  thus  motivates  the  creation  of  a  new  NOT  symbol 
acting  on  ROC  functions  (and  consequently  ROC  curves). 
Given  /  G  we  will  write 

/  =  V(/). 

We  read  /  as  “not  /”. 

Theorem  6:  (ROC  Boolean  Algebra)  =,  U,  n,  -i)  is  a 
Boolean  Algebra  of  ROC  curves. 

Our  main  result  is  the  following  theorem. 

Theorem  7:  Let  ^  =  {A^^\  A^^\  . . . ,  A^^^}  be  a  col¬ 
lection  of  K  families  of  total  classification  systems  that  are 
mutually  independent.  Let  (^(^),  =,  A,  V,  ^)  denote  the 
Boolean  Algebra  of  total,  independent  classification  system 
families  generates  by  Let  ^  =  {f^a) ,  /a(2)  , . . . ,  /a(x)  } 
be  the  collection  of  K  ROC  curves  corresponding  to  W.  Then 
(^(^),  =,  n,  U,  -i)  is  a  Boolean  Algebra  of  ROC  curves  that 
is  isomorphic  to  (^(^),  =,  A,  V,  ^). 

The  Proof  of  this  theorem  is  too  long  for  this  conference 
proceedings.  We  motivate  its  usefulness  in  the  example  in  the 
next  section. 


Figure  2.  The  ROC  curves  /a  (red),  /b  (blue),  and  fc  (green). 


VI.  Example 


Consider  a  problem  where  we  know  the  ROC  curves 
for  the  three  independent  classification  systems  families  A, 
®,  and  C.  Assume  /a(p)  =  P^^^,  /b(p)  =  tanh(4p),  and 
fc{p)  =  2p^/^-^  —  p^/^-^,  see  Figure  (2)  for  their  graphs. 
Observe  that  no  single  ROC  curve  completely  dominates  the 
others.  The  independent  families  generated  by  {A,®,C}  of 
interest  is  given  in  the  table  below.  We  do  not  use  the  NOT 
of  these  families  since  their  ROC  curves  will  fall  below  the 
chance  line  implying  poor  performance. 


single 


C 


pairs 

triples 

A  AB 

(B 

AC) 

A  A 

AaC 

(A 

AB) 

V  A 

B  AC 

(A 

AC) 

V  A 

A  VB 

(B 

AC) 

V  A 

A  VC 

(B 

VC) 

V  A 

B  VC 

(A 

AB) 

VB 

(A 

AC) 

VB 

(B 

AC) 

VB 

(A 

AB) 

VC 

(A 

AC) 

VC 

(B 

AC) 

VC 

The  majority  vote  family  is 


V  =  (A  A  ®)  V  (A  A  C)  V  (®  A  C) 
=  (A  V  ®)  A  (A  V  C)  A  (®  V  C) 


We  compute  the  ROC  curves  for  all  these  families  using  the 
formulas  given  in  Equations  (5)  and  (7)  and  plot  them  all  in 
Figure  3.  It  appears  that  the  majority  vote  family  dominates 
all  the  other  families,  but  upon  closer  inspect  we  see  that  for 
small  false  positive  values  (<  0.03)  the  majority  vote  is  not 
the  best.  If  one  chooses  different  ROC  curves  then  all  these 
curves  will  change,  and  the  majority  may  not  dominate  as 
much.  Further  research  will  be  performed  to  determine  this 
dependency. 


Figure  3.  The  ROC  curves  of  all  the  families  in  the  Boolean  Algebra 
generated  by  /a  (red),  /b  (blue),  and  fc  (green).  The  blacks  curves  are  the 
ROC  curves  generated  using  Equations  (5)  and  (7).  The  arrow  points  to  the 
curve  that  corresponds  to  the  majority  vote  family. 


VII.  Conclusions 

We  have  shown  that  for  label  fusion  of  a  finite  number 
of  classification  system  families,  we  can  start  with  Boolean 
rules  of  AND,  OR,  and  NOT.  From  this  we  develop  a 
Boolean  Algebra  of  classification  system  families.  Under  the 
assumption  of  the  independence  of  these  families,  we  have 
that  this  Boolean  Algebra  of  classification  system  families 
is  represented  (in  fact,  isomorphic)  by  a  Boolean  Algebra  of 
ROC  curves.  The  ROC  Boolean  Algebra  is  constructed  using 
the  ROC  curves  of  the  original  families.  There  are  several 
possible  uses  for  this  algebra.  One  is  that  by  calculating  out 
all  the  elements  of  ROC  Boolean  Algebra  from  the  original 
families,  the  cost  of  testing  a  Boolean  decision  rule  is  virtually 
zero.  A  second  use  is  that,  if  we  are  considering  a  non-Boolean 
decision  rule  or  a  fusion  rule  built  from  the  data  set  level  or 
feature  set  level  of  the  classification  systems,  by  calculating 
the  entire  Boolean  Algebra  and  taking  the  frontier  of  the 
resulting  set  of  ROC  curves,  we  can  construct  a  bound  by 
which  to  compare  the  performance  of  any  other  fusion  rule, 
deterministic  or  randomized  (see  Thorsen  [10]).  That  is,  if  the 
new  fusion  rule  does  not  out  perform  any  Boolean  rule  then 
why  use  it? 

The  applications  of  this  procedure  are  manifold.  A  re¬ 
searcher  is  empowered  to  leverage  legacy  classification  sys¬ 
tems  in  ways  he/she  may  not  have  thought  of  before,  by  using 
completely  constructive  testing  using  the  ROC  curves  of  the 
legacy  systems. 


and  the  Air  Combat  Command  (ACC/DRCA)  at  Langley  Air 
Force  Base,  Virginia. 

The  views  expressed  in  this  article  are  those  of  the 
authors  and  do  not  reflect  the  official  policy  or  position  of 
the  United  States  Air  Force,  Department  of  Defense,  or  the 
US  Government. 
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